99 research outputs found

    Guessers for Finite-State Transducer Lexicons

    Get PDF
    Language software applications encounter new words, e.g., acronyms, technical terminology, names or compounds of such words. In order to add new words to a lexicon, we need to indicate their inflectional paradigm. We present a new generally applicable method for creating an entry generator, i.e. a paradigm guesser, for finite-state transducer lexicons. As a guesser tends to produce numerous suggestions, it is important that the correct suggestions be among the first few candidates. We prove some formal properties of the method and evaluate it on Finnish, English and Swedish full-scale transducer lexicons. We use the open-source Helsinki Finite-State Technology to create finitestate transducer lexicons from existing lexical resources and automatically derive guessers for unknown words. The method has a recall of 82-87 % and a precision of 71-76 % for the three test languages. The model needs no external corpus and can therefore serve as a baseline.Peer reviewe

    Language and Dialect Identification of Cuneiform Texts

    Full text link
    This article introduces a corpus of cuneiform texts from which the dataset for the use of the Cuneiform Language Identification (CLI) 2019 shared task was derived as well as some preliminary language identification experiments conducted using that corpus. We also describe the CLI dataset and how it was derived from the corpus. In addition, we provide some baseline language identification results using the CLI dataset. To the best of our knowledge, the experiments detailed here are the first time automatic language identification methods have been used on cuneiform data

    Improving Word Association Measures in Repetitive Corpora with Context Similarity Weighting

    Get PDF
    Peer reviewe

    Weighted Finite-State Morphological Analysis of Finnish Compounding with HFST-LEXC

    Get PDF
    Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kristiina Jokinen and Eckhard Bick. NEALT Proceedings Series, Vol. 4 (2009), 89-95. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9206

    FIN-CLARIN – en humanistisk forskningsinfrastruktur med betoning pĂ„ sprĂ„k

    Get PDF
    Miljardvis med ord och tusentals timmar med audio och video behövs som material för humanistisk forskning och i synnerhet sprÄkforskning. Dessutom behöver forskarna redskap för att förÀdla och jÀmföra sina egna datasamlingar med allmÀnna datasamlingar. NÀr ett forskningsprojekt Àr slut behövs det lagrings- och spridningsplatser för att göra rÄdata, redskap och forskningsresultat tillgÀngliga och anvÀndbara. Data, redskap och gemensamma anvÀndningsmöjligheter bildar tillsammans en forskningsinfrastruktur, som gör det möjligt att verifiera tidigare resultat och effektivare göra nya rön, nÀr alla inte behöver starta frÄn noll med att samla data och bygga analysredskap

    Laundry Symbols and License Management : Practical Considerations for the Distribution of LRs based on experiences from CLARIN

    Get PDF
    One of the most challenging tasks in building language resources is the copyright license management. There are several reasons for this. First of all, the current European copyright system is designed to a large extent to satisfy the commercial actors, e.g. publishers, record companies etc. This means that the scope and duration of the rights are very extensive and there are even certain forms of protection that do not exist elsewhere in the world, e.g. database right. On the other hand, the exceptions for research and teaching are typically very narrow.Vertaisarvioitu/peerReviewe

    HeLI-OTS, Off-the-shelf Language Identifier for Text

    Get PDF
    Peer reviewe

    The CLARIN Committee for Legal and Ethical Issues and the Normative Layer of the CLARIN Infrastructure : Ville Oksanen, in memoriam (26 december 1976-23 november 2014)

    Get PDF
    Publisher Copyright: © 2022 Darja Fiƥer and Andreas Witt, published by Walter deGruyter GmbH, Berlin/Boston. All rights reserved.Peer reviewe
    • 

    corecore